52 research outputs found

    Metalibm: A Mathematical Functions Code Generator

    Get PDF
    International audienceThere are several different libraries with code for mathematical functions such as exp, log, sin, cos, etc. They provide only one implementation for each function. As there is a link between accuracy and performance, that approach is not optimal. Sometimes there is a need to rewrite a function's implementation with the respect to a particular specification. In this paper we present a code generator for parametrized implementations of mathematical functions. We discuss the benefits of code generation for mathematical libraries and present how to implement mathematical functions. We also explain how the mathematical functions are usually implemented and generalize this idea for the case of arbitrary function with implementation parameters. Our code generator produces C code for parametrized functions within a known scheme: range reduction (domain splitting), polynomial approximation and reconstruction. This approach can be expanded to generate code for black-box functions, e.g. defined only by differential equations

    Satisfiability Modulo Transcendental Functions via Incremental Linearization

    Full text link
    In this paper we present an abstraction-refinement approach to Satisfiability Modulo the theory of transcendental functions, such as exponentiation and trigonometric functions. The transcendental functions are represented as uninterpreted in the abstract space, which is described in terms of the combined theory of linear arithmetic on the rationals with uninterpreted functions, and are incrementally axiomatized by means of upper- and lower-bounding piecewise-linear functions. Suitable numerical techniques are used to ensure that the abstractions of the transcendental functions are sound even in presence of irrationals. Our experimental evaluation on benchmarks from verification and mathematics demonstrates the potential of our approach, showing that it compares favorably with delta-satisfiability /interval propagation and methods based on theorem proving

    Automatic Estimation of Verified Floating-Point Round-Off Errors via Static Analysis

    Get PDF
    This paper introduces a static analysis technique for computing formally verified round-off error bounds of floating-point functional expressions. The technique is based on a denotational semantics that computes a symbolic estimation of floating-point round-o errors along with a proof certificate that ensures its correctness. The symbolic estimation can be evaluated on concrete inputs using rigorous enclosure methods to produce formally verified numerical error bounds. The proposed technique is implemented in the prototype research tool PRECiSA (Program Round-o Error Certifier via Static Analysis) and used in the verification of floating-point programs of interest to NASA

    Wave Equation Numerical Resolution: a Comprehensive Mechanized Proof of a C Program

    Get PDF
    We formally prove correct a C program that implements a numerical scheme for the resolution of the one-dimensional acoustic wave equation. Such an implementation introduces errors at several levels: the numerical scheme introduces method errors, and floating-point computations lead to round-off errors. We annotate this C program to specify both method error and round-off error. We use Frama-C to generate theorems that guarantee the soundness of the code. We discharge these theorems using SMT solvers, Gappa, and Coq. This involves a large Coq development to prove the adequacy of the C program to the numerical scheme and to bound errors. To our knowledge, this is the first time such a numerical analysis program is fully machine-checked.Comment: No. RR-7826 (2011

    FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

    Get PDF
    Background Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA\u27s on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. Results We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10Ă— speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Conclusions Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs)

    Fixed-point trigonometric functions on FPGAs

    No full text

    GPU vs FPGA: A Comparative Analysis for Non-standard Precision

    No full text
    Abstract. FPGAs and GPUs are increasingly used in a range of high performance computing applications. When implementing numerical al-gorithms on either platform, we can choose to represent operands with different levels of accuracy. A trade-off exists between the numerical ac-curacy of arithmetic operators and the resources needed to implement them. Where algorithmic requirements for numerical stability are cap-tured in a design description, this trade-off can be exploited to opti-mize performance by using high-accuracy operators only where they are most required. Support for half and double-double floating point repre-sentations allows additional flexibility to achieve this. The aim of this work is to study the language and hardware support, and the achievable peak performance for non-standard precisions on a GPU and an FPGA. A compute intensive program, matrix-matrix multiply, is selected as a benchmark and implemented for various different matrix sizes. The re-sults show that for large-enough matrices, GPUs out-perform FPGA-based implementations but for some smaller matrix sizes, specialized FPGA floating-point operators for half and double-double precision can deliver higher throughput than implementation on a GPU
    • …
    corecore